Musite, a Tool for Global Prediction of General and Kinase-specific Phosphorylation Sites*□S
نویسندگان
چکیده
Reversible protein phosphorylation is one of the most pervasive post-translational modifications, regulating diverse cellular processes in various organisms. High throughput experimental studies using mass spectrometry have identified many phosphorylation sites, primarily from eukaryotes. However, the vast majority of phosphorylation sites remain undiscovered, even in well studied systems. Because mass spectrometry-based experimental approaches for identifying phosphorylation events are costly, time-consuming, and biased toward abundant proteins and proteotypic peptides, in silico prediction of phosphorylation sites is potentially a useful alternative strategy for whole proteome annotation. Because of various limitations, current phosphorylation site prediction tools were not well designed for comprehensive assessment of proteomes. Here, we present a novel software tool, Musite, specifically designed for large scale predictions of both general and kinase-specific phosphorylation sites. We collected phosphoproteomics data in multiple organisms from several reliable sources and used them to train prediction models by a comprehensive machinelearning approach that integrates local sequence similarities to known phosphorylation sites, protein disorder scores, and amino acid frequencies. Application of Musite on several proteomes yielded tens of thousands of phosphorylation site predictions at a high stringency level. Cross-validation tests show that Musite achieves some improvement over existing tools in predicting general phosphorylation sites, and it is at least comparable with those for predicting kinase-specific phosphorylation sites. In Musite V1.0, we have trained general prediction models for six organisms and kinase-specific prediction models for 13 kinases or kinase families. Although the current pretrained models were not correlated with any particular cellular conditions, Musite provides a unique functionality for training customized prediction models (including condition-specific models) from users’ own data. In addition, with its easily extensible open source application programming interface, Musite is aimed at being an open platform for community-based development of machine learning-based phosphorylation site prediction applications. Musite is available at http://musite.sourceforge.net/. Molecular & Cellular Proteomics 9:2586–2600, 2010.
منابع مشابه
Musite, a tool for global prediction of general and kinase-specific phosphorylation sites.
Reversible protein phosphorylation is one of the most pervasive post-translational modifications, regulating diverse cellular processes in various organisms. High throughput experimental studies using mass spectrometry have identified many phosphorylation sites, primarily from eukaryotes. However, the vast majority of phosphorylation sites remain undiscovered, even in well studied systems. Beca...
متن کاملPredicting and Analyzing Protein Phosphorylation Sites in Plants Using Musite
Although protein phosphorylation sites can be reliably identified with high-resolution mass spectrometry, the experimental approach is time-consuming and resource-dependent. Furthermore, it is unlikely that an experimental approach could catalog an entire phosphoproteome. Computational prediction of phosphorylation sites provides an efficient and flexible way to reveal potential phosphorylation...
متن کاملPhosphorylation of Staphylococcus aureus Protein-Tyrosine Kinase Affects the Function of Glucokinase and Biofilm Formation
Background: When Staphylococcus aureus is grown in the presence of high concentration of external glucose, this sugar is phosphorylated by glucokinase (glkA) to form glucose-6-phosphate. This product subsequently enters into anabolic phase, which favors biofilm formation. The presence of ROK (repressor protein, open reading frame, sugar kinase) motif, phosphate-1 and -2 sites, and tyrosine kina...
متن کاملRF-Phos: A Novel General Phosphorylation Site Prediction Tool Based on Random Forest.
Protein phosphorylation is one of the most widespread regulatory mechanisms in eukaryotes. Over the past decade, phosphorylation site prediction has emerged as an important problem in the field of bioinformatics. Here, we report a new method, termed Random Forest-based Phosphosite predictor 2.0 (RF-Phos 2.0), to predict phosphorylation sites given only the primary amino acid sequence of a prote...
متن کاملKinasePhos: a web tool for identifying protein kinase-specific phosphorylation sites
KinasePhos is a novel web server for computationally identifying catalytic kinase-specific phosphorylation sites. The known phosphorylation sites from public domain data sources are categorized by their annotated protein kinases. Based on the profile hidden Markov model, computational models are learned from the kinase-specific groups of the phosphorylation sites. After evaluating the learned m...
متن کامل